Search CORE

19 research outputs found

An End-to-End Neural Network for Polyphonic Music Transcription

Author: Benetos E
Dixon S
Sigtia S
Publication venue: 'Center for Open Science'
Publication date: 18/11/2015
Field of study

We present a neural network model for polyphonic music transcription. The architecture of the proposed model is analogous to speech recognition systems and comprises an acoustic model and a music language mode}. The acoustic model is a neural network used for estimating the probabilities of pitches in a frame of audio. The language model is a recurrent neural network that models the correlations between pitch combinations over time. The proposed model is general and can be used to transcribe polyphonic music without imposing any constraints on the polyphony or the number or type of instruments. The acoustic and language model predictions are combined using a probabilistic graphical model. Inference over the output variables is performed using the beam search algorithm. We investigate various neural network architectures for the acoustic models and compare their performance to two popular state-of-the-art acoustic models. We also present an efficient variant of beam search that improves performance and reduces run-times by an order of magnitude, making the model suitable for real-time applications. We evaluate the model's performance on the MAPS dataset and show that the proposed model outperforms state-of-the-art transcription systems

Queen Mary Research Online

Learning to Generate Genotypes with Neural Networks

Author: Churchill AW
Fernando C
Sigtia S
Publication venue
Publication date: 14/04/2016
Field of study

Neural networks and evolutionary computation have a rich intertwined history. They most commonly appear together when an evolutionary algorithm optimises the parameters and topology of a neural network for reinforcement learning problems, or when a neural network is applied as a surrogate fitness function to aid the evolutionary optimisation of expensive fitness functions. In this paper we take a different approach, asking the question of whether a neural network can be used to provide a mutation distribution for an evolutionary algorithm, and what advantages this approach may offer? Two modern neural network models are investigated, a Denoising Autoencoder modified to produce stochastic outputs and the Neural Autoregressive Distribution Estimator. Results show that the neural network approach to learning genotypes is able to solve many difficult discrete problems, such as MaxSat and HIFF, and regularly outperforms other evolutionary techniques

arXiv.org e-Print Archive

Queen Mary Research Online

An End-to-End Neural Network for Polyphonic Piano Music Transcription

Author: Benetos E
Dixon S
Sigtia S
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/02/2016
Field of study

We present a supervised neural network model for polyphonic piano music transcription. The architecture of the proposed model is analogous to speech recognition systems and comprises an acoustic model and a music language model. The acoustic model is a neural network used for estimating the probabilities of pitches in a frame of audio. The language model is a recurrent neural network that models the correlations between pitch combinations over time. The proposed model is general and can be used to transcribe polyphonic music without imposing any constraints on the polyphony. The acoustic and language model predictions are combined using a probabilistic graphical model. Inference over the output variables is performed using the beam search algorithm. We perform two sets of experiments. We investigate various neural network architectures for the acoustic models and also investigate the effect of combining acoustic and music language model predictions using the proposed architecture. We compare performance of the neural network based acoustic models with two popular unsupervised acoustic models. Results show that convolutional neural network acoustic models yields the best performance across all evaluation metrics. We also observe improved performance with the application of the music language models. Finally, we present an efficient variant of beam search that improves performance and reduces run-times by an order of magnitude, making the model suitable for real-time applications

arXiv.org e-Print Archive

Crossref

Queen Mary Research Online

Recommended from our members

An RNN-based Music Language Model for Improving Automatic Music Transcription

Author: Benetos E.
Cherla S.
Dixon S.
Garcez A.
Sigtia S.
Weyde T.
Publication venue: International Society for Music Information Retrieval
Publication date: 01/01/2014
Field of study

In this paper, we investigate the use of Music Language Models (MLMs) for improving Automatic Music Transcription performance. The MLMs are trained on sequences of symbolic polyphonic music from the Nottingham dataset. We train Recurrent Neural Network (RNN)-based models, as they are capable of capturing complex temporal structure present in symbolic music data. Similar to the function of language models in automatic speech recognition, we use the MLMs to generate a prior probability for the occurrence of a sequence. The acoustic AMT model is based on probabilistic latent component analysis, and prior information from the MLM is incorporated into the transcription framework using Dirichlet priors. We test our hybrid models on a dataset of multiple-instrument polyphonic music and report a significant 3% improvement in terms of F-measure, when compared to using an acoustic-only model

City Research Online

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

A Hybrid Recurrent Neural Network For Music Transcription

Author: Benetos E
Boulanger-Lewandowski N
Dixon S
Garcez ASD
IEEE
Sigtia S
Weyde T
Publication venue
Publication date: 01/01/2014
Field of study

We investigate the problem of incorporating higher-level symbolic score-like information into Automatic Music Transcription (AMT) systems to improve their performance. We use recurrent neural networks (RNNs) and their variants as music language models (MLMs) and present a generative architecture for combining these models with predictions from a frame level acoustic classifier. We also compare different neural network architectures for acoustic modeling. The proposed model computes a distribution over possible output sequences given the acoustic input signal and we present an algorithm for performing a global search for good candidate transcriptions. The performance of the proposed model is evaluated on piano music from the MAPS dataset and we observe that the proposed model consistently outperforms existing transcription methods

arXiv.org e-Print Archive

City Research Online

Queen Mary Research Online

RNN-based Music Language Models for Improving Automatic Music Transcription

Author: Benetos E
Cherla S
Dixon S
Garcez A
Sigtia S
Weyde T
Publication venue
Publication date: 01/01/2014
Field of study

Queen Mary Research Online

Chime-home: A dataset for sound source recognition in a domestic environment

Author: Barker J
Foster P
Krstulovic S
Plumbley MD
Sigtia S
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

For the task of sound source recognition, we introduce a novel data set based on 6.8 hours of domestic environment audio recordings. We describe our approach of obtaining annotations for the recordings. Further, we quantify agreement between obtained annotations. Finally, we report baseline results for sound source recognition using the obtained dataset. Our annotation approach associates each 4-second excerpt from the audio recordings with multiple labels, on a set of 7 labels associated with sound sources in the acoustic environment. With the aid of 3 human annotators, we obtain 3 sets of multi-label annotations, for 4378 4-second audio excerpts. We evaluate agreement between annotators by computing Jaccard indices between sets of label assignments. Observing varying levels of agreement across labels, with a view to obtaining a representation of ‘ground truth’ in annotations, we refine our dataset to obtain a set of multi-label annotations for 1946 audio excerpts. For the set of 1946 annotated audio excerpts, we predict binary label assignments using Gaussian mixture models estimated on MFCCs. Evaluated using the area under receiver operating characteristic curves, across considered labels we observe performance scores in the range 0.76 to 0.9

CiteSeerX

Crossref

University of Surrey

Surrey Research Insight

IMPROVED MUSIC FEATURE LEARNING WITH DEEP NEURAL NETWORKS

Author: Dixon S
IEEE
Sigtia S
Publication venue
Publication date: 01/01/2014
Field of study

Queen Mary Research Online

Automatic environmental sound recognition: Performance versus computational cost

Author: Krstulovic S
Plumbley Mark
Sigtia S
Stark AM
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/01/2019
Field of study

In the context of the Internet of Things (IoT), sound sensing applications are required to run on embedded platforms where notions of product pricing and form factor impose hard constraints on the available computing power. Whereas Automatic Environmental Sound Recognition (AESR) algorithms are most often developed with limited consideration for computational cost, this article seeks which AESR algorithm can make the most of a limited amount of computing power by comparing the sound classification performance as a function of its computational cost. Results suggest that Deep Neural Networks yield the best ratio of sound classification accuracy across a range of computational costs, while Gaussian Mixture Models offer a reasonable accuracy at a consistently small cost, and Support Vector Machines stand between both in terms of compromise between accuracy and computational cost

University of Surrey

Baby Cry Sound Detection: A Comparison of Hand Crafted Features and Deep Learning Approach

Author: CC Chang
D Barchiesi
JC Wang
L Deng
S Ntalampiras
S Sigtia
T Saito
Publication venue: HAL CCSD
Publication date: 25/08/2017
Field of study

International audienceBaby cry sound detection allows parents to be automatically alerted when their baby is crying. Current solutions in home environment ask for a client-server architecture where an end-node device streams the audio to a centralized server in charge of the detection. Even providing the best performances, these solutions raise power consumption and privacy issues. For these reasons, interest has recently grown in the community for methods which can run locally on battery-powered devices. This work presents a new set of features tailored to baby cry sound recognition, called hand crafted baby cry (HCBC) features. The proposed method is compared with a baseline using mel-frequency cepstrum coefficients (MFCCs) and a state-of-the-art convolutional neural network (CNN) system. HCBC features result to be on par with CNN, while requiring less computation effort and memory space at the cost of being application specific

Crossref